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CONSTRUCTION OF DECISION TREES 
Edwin Roger Banks 


The construction of optimal decision trees for the problem stated 
within can be accomplished ay an exhaustive enumeration. This paper 
discusses two approaches. The section on heuristic methods gives 
mostly negative results (e.g, there Is no merit factor that will 
always yield the optimal test, etc.}, but most of these methods do 
give good results. The section entitled "Exhaustive Enumeration 
Revisited" indicates sene powerful shortcuts that tan be applied to 
an exhaustive enumeration, extending the range of this method. 



CONSTRUCTION OF DECISION TREES 


Edwin Roger Banks 


INTRODUCTION 
A. The problem 


This paper considers the optimal procedure for determining 
whether a network of switches is open or closed. Each switch i 
ha# an a priori probability p t of being doted and an associated 
cost to determine the condition of the switch. The problem can 
also be expressed in Polish notation a8p for example; 


(AND (OR (AND Tj T^) (OR T 4 T^> > 


where the T^, are testa with true or false outcomes and 
p. (of being true) and . A third formulation of the 
bo used in this paperr The problem tree fur the above 
Lb shown with the network representation: 




associated 
probles... will 
Fdish £ona 



b. short form 


PROBLEM TREE 


Bridge networks will be disallowed. 

Our goal is to obtain the decision tree of minimum cxpeoted cost. 
A typical decision tree for the above problem tree is showti: 



where / indicates 
open circuit; 

indicates a 
closed circuit. 








The expected CP9t of this decision tree is; 



P 3 <C 4 + £t 4 C^) + 

1 3 < C i + P L <C 2 + p 2 CC 4 + q 4 C 5 ) > ) 

where q ■ 1 - p la the a priori probability c£ being open , The above 
formula was obtained' from the tree as follows r is the first test; 

it is made unconditionally r If the tost results In switch [1 being 
closed, then the parenthesized part of the second lino will become 
the expected coat of determining the remaining part of the circuit. 
And switch 3 is closed with probability . 


Consideration of this problem probably originated (see Eorlekamp ) 
in an attempt to optimize telephone switching circuits.'* Another area 
is problem solving where the solution to a problem can involve solving 
sub-problems* In our problem tree, the AND represents a division 
of the problem into sub-problems which jointly must be solved, and fch* 
OR, those for which the solution of any sub-problem solves the problem. 


+In this problem the cost is time. 



Let i) be the number of switches,, end X^ Che number of possible 
decision trees. To be considered In we require only that tile tree 
has no repeated tests, (Otherwise = eo ) Then becomesr 

X - n (X , + l) 2 and X n =1 
n n-I 1 

or 

X^ - 1,0, 243, 230144, 200000000000, .., for n = 1, 2, 3, 4, J, , 

This formula can be crltizad because it counts incomplete decision 

trees, but if X is to rule out these trees, then it bee nines a function 
n 

of how the n switches are Interconnected, In any event , a procedure 
which atteopta to find optimal trees by exhaustive enumeration and 
comparison will be practically limited to n=4 or less. 

Interestingly, out of the 239000 trees for the 4-switch network, 
for example, there are exactly eight which qualify aa a possibly 
optimal decision tree* regardless of the values of the C £ * the p i 
or how connected \ Results of this sort are reported in the section 
titled "EXHAUSTIVE ENUMERATION REVISITED 11 , 

The above results Were discovered after an initial effort to 
apply heuristics to the problem. Several interesting theorems, in¬ 
cluding the failure theorem which shows that counter-examples exist 
for a broad class of heuristics, are reported in the next section. 

The last section considers thu technique of test-at-a-Elms, 1, e, 
instead of conatruetiug a good decision tree Which is simply read in 
order to choose a test, how cm a good teat be chosen without the 
treel The user it of this technique is that the decision tree is 
typically very large requiring storage spate, (If a buihy problem 
tree is assumed, the average path length of the decision tree will 
be n/2 giving a total site of about 



HEURISTIC TECKtttqUES 


The first class of heuristics Involvea Che merit-factor approach. 

A car it factor ie calculated for each, switch and the largest for 
anaileat) merit factor determine* the test, For each possible out¬ 
done of the test (open or closed) the c1toutt.dlagram is thfUflsd 
And new niertt factors are computed, The merit factor may be a function 
of the tests or probabilities of any or all of the switches and of 
the structure of the network itself* and other factors. We will 
design a few merit factors and analyze them, 

A useful quantity to include in a merit-factor is the a priori 
probability that the entire circuit is closed, let F designate this 
quantity which is easily evaluated from the p^ and the problem tree, 
we will also use Q - 1 - F as the total probability of an open circuit. 
Two other useful quantities are £f^ and AP^ w11 ^ re P™ Hent 

the Increase which results In F If switch i is closed. Similarly* 

AQ^ will represent the increase in Q; for switch i open . 

As tests ore n*de, a plot of P against cost can be made: 



The goal is the top or bottom 
line. 


P as a Function of lasts Drawn for a Particular Decision Tree* 



This diagram suggested most of the following merit factors. 

1. heuristic; mafcc that tost which gives the greatest erepccted 
change in P per unit cost. Thus we define 


F l. " ( P i AF i 4 *i AQ i > f C i 

Z. Heuristic; mahe that test which gives the greatest percent 
change in t per unit tost* 


F„ 
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3, Heuristic; weight the factors of Fj by P and (J. Thus we 
multiply the expected increase by P which Is the probability 
that ati Increase is desired, 



(p p Ap + q Q &Q> / C 










4. Heuristic; make the teat which yields the largest ejected 
percent reduction In distance remaining to the goal: 

P A? . q da 


'4 


{ 


) / c 


TKEDREH I. The above heuristics are equivalent {within constants) 
to each other and to the itmpll merit factor F^ : 

T 5 " p /C 
To prove THSOEEM I we need LEMMA I; 

LEMMA I, P is individually linear In p.« That is, for functiona $ 
and H; 

P • G(p 1 >P 2 k .„p ± ^ p p 1 +lM „p ii ) + H{p lJ „.p ± _ lJ p i+L ,„ i ) Pj . 
Inductive proof of LEMMA I* 

Any network {including bridge type networks} can be built 
by starting with one switch and two nodes and adding switches and 
nodes by use of two methods. 

Ip fiy connecting two existing nodes by a switch, 

II. By adding a switch and s node between a switch and a node. 

)H => 

I, CGKKECTING TWO WOKES BY II, INSERTING A SWITCH AND NODE 

A SWITCH BETWEEN A SWITCH AND NODE 

The leocaa is obviously true for a circuit with only one, switch and 
two nodes. Consider case I. first. Let us add switch 1, P 1 Is 
the new probability and P is individually linear by the induction 
hypothesis. 

P' - F lf + {1 - p £ ) F 

F" is a network with one less node and Is therefore satisfactory. 
Obviously E’ is individually linear, Considering case II as we add 
switch i; 

F' * p ± P * {1 - P t > P' 1 
again still individually linear In p^, Q.-E.-C, 






Nov v* can prove TkEOrEji I. First we show the interesting 
equlvfl lence 

Pj AP t " <|£ for all i. 

Since P is individually linear in p,^ we can write 
P =■ G + tt p t 

where 6 and t£ are the functions of the probabilities of the other 
switches, is the increase in P If = 1 or 

A*\ = (6 + H-i) - (G + H Pi ) = H <1 - P± ) - H q ± 

And Slnce 

c * h p £ = g + h a-q £ > = ce + n> - ii tt t 
and Q = l - p = ( l + G + H) + H ^ 
we can show in a Similar manner as above for Athat 
4 Qj 3 Hp t 

Therefore 4 

Pj- AP£ ■ q £ AQ^ * Hp t ^ 

Let us use X as a shorthand for p AP to pro vc the equivalence of 
our merit factors. 

F 1 “ (pJiF + q AQ) / G ■* CSC ■+ 3C> / C * 2 X / C 

P 2 = ( X/F + K./Q ) f G = C 1/PQ ) X / C since P + Q * 

and F and Q are initially the same for all testa. 

r 3 - (px + qx) / c = x / c 

F 4 = ( X/Q + X./P ) / c = ( 1 /PQ > x / c 

Also any atanotonically increasing, functional cotabination of Fj ... 
will he equivalent to in the sense that It will yield the same 
test and hence the same decision tree. 


One might expect that the time to find the largest p£P 1 C 

would be at leaat proportional to the number of switches. Thus to 

construct a decialon tree would require a time proportional to n 

times the number of teats in the entire declaion tree. Actually 

a time proportional to log n is all that la needed after the first 

test is chosen. If all the previous results is saved In tree form, 

n/5 

We have used 2 as the expected size of the decialon tree, 

(Total site of a binary tree i« twice the number of end pointa.) 

The following intuitively obviuui THEOREM suggests Why the decision 

tree should be ao large. 

THEOREM 13. There la at least one path of length n in every (complete) 
decision tree, 1. e. it la always possible that when using the 
decision tree, the State of every switch would have to be determined 
to determine the State of the circuit (open or closed). 

Inductive Proof of THEOREM 11/ 

The theorem Is obviously true for one Switch, There are two 
cases--either l'i switch la connected In parallel or In aeries, If 
in parallel and If open, then there results a network with one 
fewer Switch and THEOREM II applies to this, A parallel argument 
(pardon the pun) applies to the series connection. Thus an average 
depth of about n/2 results, It should be noted that for anv 
technique yielding optimal trees,, the time required la probably at least 
proportional to the size of the tree it la building giving 
as a lower hound, ( It la realized that perhaps the tree contains 
many duplicate aub-structures in which case a time less than the above 
may exist.) 



failure theorem 


XTlE&REH III, There does not exist a merit factor Fj^ such that the 
optima 1 dec is Ion tree ia always found hy picking that Switch with 
maximum (or dalnimium) F, F can he a function of p^ C^, the problem 
tr*e *tEMCture, the other (j p* l}j indeed of anything except the 


other switches 1 costs, 

Cj for J t t* 

Proof* 

Consider the folloving network: 


switches 1,2,3 


F. Ls restricted only to he Independent of 


1 


all p. 


1/2 



Also consider the following assignments of costa to the switches 
With optimal decision treea ahown held#* 

C L =4. C L - 4. 

" l* ^ 

= 1.99 C 3 - 2,01 




Id the first case F £ > F L while in the second case FF £ < But 
the only change wss in , and and F^ were not allowed t* depend 
on C^* 




Berlesamp gave the fo 1 loving Interesting example whose optimal 
decision tree la shown, Again the numbers are costs; all Pi - 1(2. 
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However, the following two trees trust be equivalent where 4, X, Y, and! 


Z axe substructures. 
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EQUIVALENT r&EES 


b 

A .A 


The in teres ting fact is that the costs 1 end l f In the example 
can be varied independently over a small range while both forms of 
the decision tree remain equivalent. 


2 

Winston has devised some heuristics for building trees that 
add tests and then do local improvements which can move the new feats 
to s position of lesser depth in the tree. His decision trees are for 
a more general problem in which the tests are not simple true or false, 
but divide a sec of objects into two classes, 

Reinwald and SoUnd' present an algorithm for finding th* optimal 
tree, but Wins ton claims that it is only slightly better than exhaustive 
enumeration. 







BERI.EKAKP'S RESULTS 

BerLekamp * 1 defined a couple of merle faccoro but used them in 
a different way from our usage* he defined a parallel m*rlt factor 
PWF m p./C. and a series merit factor SMF = q,/C. and showed that for 
a pure parallel or pure aeries network, then FJfF and SHF (when evalu^ 
ated and ordered for all switches) would determine the optimal tree. 
In face* extending this to a patfl1l*l=series (i*e* a parallel 
connection ©f pure series circuits) or series-parallel (beads on a 
a tr ing ) no two rk , he found the fo 1 lowing 4 Igoritlisi won Id g ive the 
optimal tree, we illustrate her* only the seriea-parallel case* 
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Example of $ erias-Para lie 1 network* 

I, Replete each bead by e single switch whoae cost is the expected 
cost of the bead and whose probability is the probability ©f the 
entire bead- 

II* Calculate SMF to pick the bead for the first test. 

Ill* Within the bead, calculate t"MF Co determine the switch. This 
switch ia the first test, 

IVk Simplify the network (for both caaea--0pen and closed switch) end 
start over* 

Although the method always yields the optimal tree for problema 
whose problem tree has a depth, of two, unfortunately attempts t© 
generalize to higher depths fail, Berlekanip gave the following counter- 


exaraple (depth of 3 ■) where the numbers are the costs of the tes CS * 



COUNTER-EXAMF]£ OPTIMAL DECISION TR£0 


















Although the Optiraal tree Above has an expected cost of 2(38.5, the 
application of flerleksEp's method yields « tree with expected cost 
of 230+25+ In fact Berlekamp found a formula for how much the method 
coata t 

BHftLEKAMF 1 3 THEOREM IV. Let be the coat of the bast strategy in 
which parallel branches a and b ire looked it consecutively and T 

opt 

the cost of the optimal strategy. Then 

T c 4 %£ + <V f . ' W l a (C a + ’a C b> 

(assuming p^/t^ ^ ) . The first fie tor is Che difference in 

the merit factorsi the second is the coat of the ftrsC branch, and 
the third is the cost of the equivalent ■eoeiblnatAOEl of the two branches. 
CORO LLAEY- If the merit factors are equal, the branches may he com¬ 
bined with no loss in expected cost* 

However, Berlekscnp did Show the following useful theorem. 

SERLEKAHP'S THEOREM. III. If the optimal strategy starts in a particular 
bead 1 L , it will start at that branch of £1 j which has the highest 
parallel merit factor. 

Attempts to generalize this theorem else failed, iS shown in the 
nejtt theorem. 

BERLEKAHF’S FALSE CONJECTURE IV+ If the optimal strategy starts in 
a substructure N» it will start at the same place that it would have 
Started in N alone. {Not true + ) 


An Attempt to get the effect of the entire structure Into the 
problem involved Substituting electrical resistors fox the switches. 
This type Of operation has been known to work for some other problems. 
We would like to have the resistance be a function of the probability 
Such that the best switch is found by Caking chat switch which has 
i) icajclawri current* 11.) mas inflow voltage drop or 111) ntaxlmum power 
dissipation per uhlb cost assuming a one volt applied voltage. 

If such d method is to work for Our problem, it should satisfy the 
fu 1lowing requ ireraeu ts ; 

I. For p = 0 * El = co and for p = 1, R = 0 

II. Resistors should combine into equivalent resistors by the 
parallel and series combination laws. 

Unfortunately the combination laws specify the form of the equation 
and I. specifies the initial conditions giving two different equations 
depending, on whether R Is in series or parallel. 

0 

P 

1 

The fact that the two curves are so similar Suggests chat Some 
combination of ft and R should give good results. 










EXHAUSTIVE ENUMERATION REVISITED 


tn the set of Mil possible decision trees for a Larye number of 
switches , rt, only a very snail fraction of Chase are possible qb op¬ 
timal trees. Same of these have identical costs. 

The following results as suite that the problem tree is slightly 
modified to contain the switches ord era-d by the if (Berlekamp) merit 
fee tore at any particular lev* I, as shown in the ftKAmple* 



rm h >, fkf c 

SMF 4 ^ SMF e > y 5MF [ 



(depth * 3) 
PRUB LEM TREE 


Berlekamp'is algorithm can be uaed on any problem whose depth is two 
or less. To help eliminate non-op % imfl 1 trees, THEORY IV can bo used. 


TUKIREX IV, At Any node in a decision tree, the maximum number of 
remaining tests assuming the outcome is a closed switch and the mixl- 
rtkju number for an outcome of open switch Are different numbers. 

Proof* 

In the proof of THEOREM II it was noted that one outcome gave 
a subproblem of exactly one lees switch* But the other case must have 
cut off part of the circuit giving a still smaller network, For instance 
if the switch was connected in series And was open, then at least two 
fewer switches result in the subproblem+ 

This theorem immediately eliminates all trees that end in either 
of the following structures. 


\ 








THEOREM. V. No structure of the following form ten esiat as an optimal 
decision tree. 


This theorem states that any two switches a and h cannot appear at 
the same level as a parallel connection in Q»C C*SO and series in 
another♦ 

Argument* 

In the problem tree t a and b have a youngest coma ancester 
which is at the parallel or aeries level, hot whichever, it cannot 
change♦ (This theorem does not apply to brLdge-type networks,) 

THEOHJK VI. In the 3 switch network composed of switch a in series 
with parallel b and c 9 If the SHF for a is greater than the maximum 
SHF for b and c* than the testa will ha applied in. order of SMF. 
Similar results hold for the dual case- 



l 1 The simplest network tor which Barlekamp 1 s -method fails (i- e- 
depth greater than two) has four switches, This is the only 4 switch 


network 

for depth of 

three. 
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DUAL NETWORK 

FSOilLei TREE 


Since the dual network is solved exactly the same as the original 
network giving identical decision trees, only one of each dual pair 
is to be considered. 






The formula for (the number of decision trees) gives about 
130,000 trees for the 4 switch problem. However, no more that* ih* 
following eight trees n«*d be considered. ft 11 the rest must be non* 
optimal. 
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How did theee eight arise? First HEKLKAHP'S THEOREM IH plus the 
fact that the switches are ordered by their merit factors before we 
start eliminates switch 4 from consideration inmadLately+ How consider 
proving, for example, that the following tree ia non-optiaa1, Wo 
will generate a sequence of equivalent trees the last of utiieh if 
more costly than one of the above eight. 
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Utore costly than 
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The above eight trees can be lumped into three classes where 


Within each circle the depth is two or less and Bar lekamp r a method 
can be used to find the optimal tree. 



Only these three cjj&ce now need be comaidered* (We are now assuming 
Chat if the depth of the problem tree ia lea a than three, &er lekamp F a 
ceclcd is ji|jplled; but If the depth la three t then these three trees 
are considered.) 

Going to five switches there ere four meaningful problem trees 
of depth three and one of depth four. 





FIVE SWITCH DEPTH = 3 TREES 



FIVE SWITCH DEPTH * 4 TREE 


Depending on which of the above problem trees ia being considered, 
there are only 7 > 5* 5* 5* or 8 trial to be made, respectively 
(see Appendix 1 ). For n ■ 5* » X^ * Z K 10** approximately* 

Optimal trees were found for each of the Hated 4 switch trees, 
hut perhaps some of the 5 switch trees c*n be eliminated by 
further study. 

Considering alx switches, there are at most five ways to pick 
the first one since at least one way violates the theorem of Berl&kerap* 
By THEOREM V one of the sub-problems (after the first one is nadu) 
has five switches and the other has four or less. Thus Che greatest 
time required ia proportional to 5 (8+3 }■ 55- 


To further fbcst operation*, scsne other techniques could 

ha^c been used. Foe example, we have seen that acme decision trees 
give identical teats end only one member of each family should be 
considered, Another area for improvement canteens avoiding duplication 
of effort on those sub-strue Lures which appear many times within a 
larger tree. Furthermore the techniques of this chapter have completely 
abandoned considerstlon of the values of the p^. and C, of the switches- 
Soma function of these values could probably divide the exhaustive 
enumeration into binary halves, greatly speeding the calculations- 
It Is vary probable that these and other pruning techniques could he 
developed to the point that fairly large networks {say 20 switches) 
could be hand led in a reasonable amount of time. But if we remember 
that the site of a typical decision tree itself grows as 2 we 
might find that about 20 to 30 would represent an upper bound on all 
techniques , 


WST AT a time. METHOD 


Since £dt Large vaLuos of n the size of the deeis ion tree becomes 
unreasonably Large, it becomes desirable, to have a procedure which 
determines the switches as the p rob let! develops. Since the p£J?/C 
method could be used (time ce find the maxiimin value of this merit 
factor was proportional to log n) in a Look-ahead scheme as a static 
evaluation function, teats could be determined very rapidly. 
Ltnfortunatly for the methods of the preceeding section, it appears that 
the entire tree would have to be determined to find the first test. 


APPEKDIX I 


Tl^o five-switch problems dcpth^3. (d u «Js 
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